Information Capacity

Definition

$$ \begin{aligned} C &= \max_{p_X} I(X; Y) \end{aligned} $$

An Interpretation

Each input n-sequence $X^{n}$ corresponds to approximately $2^{nH(Y|X)}$ possible and equally likely output n-sequences $Y^{n}​$.

The total number of possible $Y^{n}$ is approximately $2^{nH(Y)}$, which has to be divided into sets of size $2^{nH(Y|X)}$ for different $X^{n}$ sequences.

The total number of disjoint sets is no more than $2^{n[H(Y) - H(Y|X)]} = 2^{n I(X; Y)}$. There are at most $ \approx 2^{nI(X; Y)}$ distinguishable sequences of length $n$.

by Jon